Introduction
In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T> and Vector128<T> that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 1 of that blog series.
Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.
The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.
APIs covered
1. Abs
Vector64<ushort> Abs(Vector64<short> value)
This method calculates the absolute value of each vector element value, stores in a result vector and returns the result vector.
private Vector64<ushort> AbsTest(Vector64<short> value)
{
  return AdvSimd.Abs(value);
}
// value = <-11, -12, -13, 14>
// Result = <11, 12, 13, 14>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<uint> Abs(Vector64<int> value)
Vector64<byte> Abs(Vector64<sbyte> value)
Vector64<float> Abs(Vector64<float> value)
Vector128<ushort> Abs(Vector128<short> value)
Vector128<uint> Abs(Vector128<int> value)
Vector128<byte> Abs(Vector128<sbyte> value)
Vector128<float> Abs(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Abs(Vector128<double> value)
Vector128<ulong> Abs(Vector128<long> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            abs     v16.4h, v0.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
2. AbsoluteCompareGreaterThan
Vector64<float> AbsoluteCompareGreaterThan(Vector64<float> left, Vector64<float> right)
This method performs comparison of absolute value of corresponding vector elements in left with those of right vector and if the left’s value is greater than the right’s value, sets every bit of the corresponding vector element in the  result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<float> AbsoluteCompareGreaterThanTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.AbsoluteCompareGreaterThan(left, right);
}
// left = <-11.5f, -12.5f>
// right = <10.5f, -22.5f>
// Result = <NaN, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> AbsoluteCompareGreaterThan(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteCompareGreaterThan(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareGreaterThanTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            facgt   v16.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
3. AbsoluteCompareGreaterThanOrEqual
Vector64<float> AbsoluteCompareGreaterThanOrEqual(Vector64<float> left, Vector64<float> right)
This method performs comparison of absolute value of corresponding vector elements in left with those of right  vector and if the left’s value is greater than or equal to the right’s value, sets every bit of the corresponding vector element in the  result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<float> AbsoluteCompareGreaterThanOrEqualTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.AbsoluteCompareGreaterThanOrEqual(left, right);
}
// left = <-11.5f, -12.5f>
// right = <11.5f, -22.5f>
// Result = <NaN, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> AbsoluteCompareGreaterThanOrEqual(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteCompareGreaterThanOrEqual(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareGreaterThanOrEqualTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            facge   v16.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
4. AbsoluteCompareGreaterThanOrEqualScalar
Vector64<double> AbsoluteCompareGreaterThanOrEqualScalar(Vector64<double> left, Vector64<double> right)
This method compares the absolute value of corresponding vector elements of left and right vector and if the left’s element value is greater than or equal to the right’s element value, sets every bit of the corresponding vector element in the  result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> AbsoluteCompareGreaterThanOrEqualScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.AbsoluteCompareGreaterThanOrEqualScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteCompareGreaterThanOrEqualScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareGreaterThanOrEqualScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            facge   d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
5. AbsoluteCompareGreaterThanScalar
Vector64<double> AbsoluteCompareGreaterThanScalar(Vector64<double> left, Vector64<double> right)
This method compares the absolute value of corresponding vector elements of left and right vector and if the left’s element value is greater than the right’s element value, sets every bit of the corresponding vector element in the  result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> AbsoluteCompareGreaterThanScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.AbsoluteCompareGreaterThanScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteCompareGreaterThanScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareGreaterThanScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            facgt   d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
6. AbsoluteCompareLessThan
Vector64<float> AbsoluteCompareLessThan(Vector64<float> left, Vector64<float> right)
This method performs comparison of absolute value of corresponding vector elements in leftand right vector and if the left’s value is less than to the right’s value, sets every bit of the corresponding vector element in the  result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<float> AbsoluteCompareLessThanTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.AbsoluteCompareLessThan(left, right);
}
// left = <-11.5f, -12.5f>
// right = <10.5f, -22.5f>
// Result = <0, NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> AbsoluteCompareLessThan(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteCompareLessThan(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareLessThanTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            facgt   v16.2s, v1.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
7. AbsoluteCompareLessThanOrEqual
Vector64<float> AbsoluteCompareLessThanOrEqual(Vector64<float> left, Vector64<float> right)
This method performs comparison of absolute value of each vector element in left with the absolute value of the corresponding vector element in right and if the left’s value is less than or equal to the right’s value, sets every bit of the corresponding vector element in the  result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<float> AbsoluteCompareLessThanOrEqualTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.AbsoluteCompareLessThanOrEqual(left, right);
}
// left = <-11.5f, -12.5f>
// right = <11.5f, -22.5f>
// Result = <0, NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> AbsoluteCompareLessThanOrEqual(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteCompareLessThanOrEqual(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareLessThanOrEqualTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            facge   v16.2s, v1.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
8. AbsoluteCompareLessThanOrEqualScalar
Vector64<double> AbsoluteCompareLessThanOrEqualScalar(Vector64<double> left, Vector64<double> right)
This method compares the absolute value of corresponding vector elements of left and right vector and if the left’s element value is less than or equal to the right’s element value, sets every bit of the corresponding vector element in the  result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> AbsoluteCompareLessThanOrEqualScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.AbsoluteCompareLessThanOrEqualScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteCompareLessThanOrEqualScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareLessThanOrEqualScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            facge   d16, d1, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
9. AbsoluteCompareLessThanScalar
Vector64<double> AbsoluteCompareLessThanScalar(Vector64<double> left, Vector64<double> right)
This method compares the absolute value of corresponding vector elements of left and right vector and if the left’s element value is less than the right’s element value, sets every bit of the corresponding vector element in the  result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> AbsoluteCompareLessThanScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.AbsoluteCompareLessThanScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteCompareLessThanScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareLessThanScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            facgt   d16, d1, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
10. AbsoluteDifference
Vector64<byte> AbsoluteDifference(Vector64<byte> left, Vector64<byte> right)
This method subtracts the corresponding vector elements of right vector from those of left vector, places the absolute values of the results in a result vector, and writes the vector to the result vector.
private Vector64<byte> AbsoluteDifferenceTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.AbsoluteDifference(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 37, 17>
// Result = <10, 10, 10, 10, 10, 10, 20, 1>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ushort> AbsoluteDifference(Vector64<short> left, Vector64<short> right)
Vector64<uint> AbsoluteDifference(Vector64<int> left, Vector64<int> right)
Vector64<byte> AbsoluteDifference(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> AbsoluteDifference(Vector64<float> left, Vector64<float> right)
Vector64<ushort> AbsoluteDifference(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> AbsoluteDifference(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> AbsoluteDifference(Vector128<byte> left, Vector128<byte> right)
Vector128<ushort> AbsoluteDifference(Vector128<short> left, Vector128<short> right)
Vector128<uint> AbsoluteDifference(Vector128<int> left, Vector128<int> right)
Vector128<byte> AbsoluteDifference(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> AbsoluteDifference(Vector128<float> left, Vector128<float> right)
Vector128<ushort> AbsoluteDifference(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AbsoluteDifference(Vector128<uint> left, Vector128<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteDifference(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uabd    v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
11. AbsoluteDifferenceAdd
Vector64<byte> AbsoluteDifferenceAdd(Vector64<byte> addend, Vector64<byte> left, Vector64<byte> right)
This method subtracts the corresponding vector elements of the right vector from those of left vector, and accumulates the absolute values of the results along with the values of addend and returns the accumulated result.
private Vector64<byte> AbsoluteDifferenceAddTest(Vector64<byte> addend, Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.AbsoluteDifferenceAdd(addend, left, right);
}
// addend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <21, 52, 23, 24, 25, 26, 27, 28>
// right = <41, 32, 33, 34, 35, 36, 37, 38>
// Result = <31, 32, 23, 24, 25, 26, 27, 28>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AbsoluteDifferenceAdd(Vector64<short> addend, Vector64<short> left, Vector64<short> right)
Vector64<int> AbsoluteDifferenceAdd(Vector64<int> addend, Vector64<int> left, Vector64<int> right)
Vector64<sbyte> AbsoluteDifferenceAdd(Vector64<sbyte> addend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> AbsoluteDifferenceAdd(Vector64<ushort> addend, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> AbsoluteDifferenceAdd(Vector64<uint> addend, Vector64<uint> left, Vector64<uint> right)
Vector128<byte> AbsoluteDifferenceAdd(Vector128<byte> addend, Vector128<byte> left, Vector128<byte> right)
Vector128<short> AbsoluteDifferenceAdd(Vector128<short> addend, Vector128<short> left, Vector128<short> right)
Vector128<int> AbsoluteDifferenceAdd(Vector128<int> addend, Vector128<int> left, Vector128<int> right)
Vector128<sbyte> AbsoluteDifferenceAdd(Vector128<sbyte> addend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> AbsoluteDifferenceAdd(Vector128<ushort> addend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AbsoluteDifferenceAdd(Vector128<uint> addend, Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceAddTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uaba    v0.8b, v1.8b, v2.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 20, prolog size 8
12. AbsoluteDifferenceScalar
Vector64<double> AbsoluteDifferenceScalar(Vector64<double> left, Vector64<double> right)
This method subtracts the floating-point values in the elements of the right vector from that of left vector, stores the absolute value of into a result vector, and returns the result vector.
private Vector64<double> AbsoluteDifferenceScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.AbsoluteDifferenceScalar(left, right);
}
// left = <11.5>
// right = <16.5>
// Result = <5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteDifferenceScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fabd    d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
13. AbsoluteDifferenceWideningLower
Vector128<ushort> AbsoluteDifferenceWideningLower(Vector64<byte> left, Vector64<byte> right)
This method subtracts the corresponding vector elements in the right from those of left and places the absolute value in a result vector and returns the result vector. The result vector Vector128<ushort> as seen in below example is twice the size of input parameter Vector4<byte>.
private Vector128<ushort> AbsoluteDifferenceWideningLowerTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.AbsoluteDifferenceWideningLower(left, right);
}
// left = <11, 2, 113, 104, 180, 11, 120, 121>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <10, 20, 90, 80, 155, 15, 93, 93>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<uint> AbsoluteDifferenceWideningLower(Vector64<short> left, Vector64<short> right)
Vector128<ulong> AbsoluteDifferenceWideningLower(Vector64<int> left, Vector64<int> right)
Vector128<ushort> AbsoluteDifferenceWideningLower(Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> AbsoluteDifferenceWideningLower(Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> AbsoluteDifferenceWideningLower(Vector64<uint> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uabdl   v16.8h, v0.8b, v1.8b
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
14. AbsoluteDifferenceWideningLowerAndAdd
Vector128<ushort> AbsoluteDifferenceWideningLowerAndAdd(Vector128<ushort> addend, Vector64<byte> left, Vector64<byte> right)
This method subtracts the corresponding vector elements of right from that of left, and accumulates the absolute values of the result along with the elements of  addend and return the accumulated vector. The result vector Vector128<ushort> as seen in below example is twice the size of input parameter Vector4<byte>.
private Vector128<ushort> AbsoluteDifferenceWideningLowerAndAddTest(Vector128<ushort> addend, Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.AbsoluteDifferenceWideningLowerAndAdd(addend, left, right);
}
// addend = <100, 200, 300, 100, 100, 100, 100, 100>
// left = <11, 2, 113, 104, 180, 11, 120, 121>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <110, 220, 390, 180, 1155, 115, 193, 193>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> AbsoluteDifferenceWideningLowerAndAdd(Vector128<int> addend, Vector64<short> left, Vector64<short> right)
Vector128<long> AbsoluteDifferenceWideningLowerAndAdd(Vector128<long> addend, Vector64<int> left, Vector64<int> right)
Vector128<short> AbsoluteDifferenceWideningLowerAndAdd(Vector128<short> addend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> AbsoluteDifferenceWideningLowerAndAdd(Vector128<uint> addend, Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> AbsoluteDifferenceWideningLowerAndAdd(Vector128<ulong> addend, Vector64<uint> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceWideningLowerAndAddTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uabal   v0.8h, v1.8b, v2.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 20, prolog size 8
15. AbsoluteDifferenceWideningUpper
Vector128<ushort> AbsoluteDifferenceWideningUpper(Vector128<byte> left, Vector128<byte> right)
This method subtracts the corresponding vector elements in upper half of right  vector from those of left,  places the absolute value of the result in a result vector and returns the result vector. The size of individual element of result Vector128<ushort> as seen in below example is twice the size of input parmeter Vector128<byte>.
private Vector128<ushort> AbsoluteDifferenceWideningUpperTest(Vector128<byte> left, Vector128<byte> right)
{
  return AdvSimd.AbsoluteDifferenceWideningUpper(left, right);
}
// left = <11, 208, 103, 184, 180, 21, 130, 151, 31, 2, 113, 104, 180, 11, 120, 121>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 20, 122, 231, 24, 25, 26, 27, 28>
// Result = <11, 120, 118, 80, 155, 15, 93, 93>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<uint> AbsoluteDifferenceWideningUpper(Vector128<short> left, Vector128<short> right)
Vector128<ulong> AbsoluteDifferenceWideningUpper(Vector128<int> left, Vector128<int> right)
Vector128<ushort> AbsoluteDifferenceWideningUpper(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> AbsoluteDifferenceWideningUpper(Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> AbsoluteDifferenceWideningUpper(Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uabdl2  v16.8h, v0.16b, v1.16b
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
16. AbsoluteDifferenceWideningUpperAndAdd
Vector128<ushort> AbsoluteDifferenceWideningUpperAndAdd(Vector128<ushort> addend, Vector128<byte> left, Vector128<byte> right)
This method subtracts the corresponding vector elements in upper half of right from those of left,  accumulates the absolute value of the result along with addened and return the accumulated vector. The size of individual element of result Vector128<ushort> as seen in below example is twice the size of input parmeter Vector128<byte>.
private Vector128<ushort> AbsoluteDifferenceWideningUpperAndAddTest(Vector128<ushort> addend, Vector128<byte> left, Vector128<byte> right)
{
  return AdvSimd.AbsoluteDifferenceWideningUpperAndAdd(addend, left, right);
}
// addend = <100, 200, 300, 100, 100, 100, 100, 100>
// left = <11, 208, 103, 184, 180, 21, 130, 151, 31, 2, 113, 104, 180, 11, 120, 121>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 20, 122, 231, 24, 25, 26, 27, 28>
// Result = <111, 320, 418, 180, 255, 115, 193, 193>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> AbsoluteDifferenceWideningUpperAndAdd(Vector128<int> addend, Vector128<short> left, Vector128<short> right)
Vector128<long> AbsoluteDifferenceWideningUpperAndAdd(Vector128<long> addend, Vector128<int> left, Vector128<int> right)
Vector128<short> AbsoluteDifferenceWideningUpperAndAdd(Vector128<short> addend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> AbsoluteDifferenceWideningUpperAndAdd(Vector128<uint> addend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> AbsoluteDifferenceWideningUpperAndAdd(Vector128<ulong> addend, Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceWideningUpperAndAddTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;  V02 arg2         [V02,T02] (  3,  3   )  simd16  ->   d2         HFA(simd16) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uabal2  v0.8h, v1.16b, v2.16b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 20, prolog size 8
17. AbsSaturate
Vector64<short> AbsSaturate(Vector64<short> value)
This method calculates saturated absolute value of each vector element of value. If any element’s absolute value is outside the range, the result is saturated. In below example, 1st lane value is -32768 which is ushort.MinValue. It’s absolute value would be 32768, but since it is out of range, it is saturated to 32767 which is ushort.MaxValue.
private Vector64<short> AbsSaturateTest(Vector64<short> value)
{
  return AdvSimd.AbsSaturate(value);
}
// value = <-32768, -12, -13, 32767>
// Result = <32767, 12, 13, 32767>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> AbsSaturate(Vector64<int> value)
Vector64<sbyte> AbsSaturate(Vector64<sbyte> value)
Vector128<short> AbsSaturate(Vector128<short> value)
Vector128<int> AbsSaturate(Vector128<int> value)
Vector128<sbyte> AbsSaturate(Vector128<sbyte> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<long> AbsSaturate(Vector128<long> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqabs   v16.4h, v0.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
18. AbsSaturateScalar
Vector64<short> AbsSaturateScalar(Vector64<short> value)
This method reads 0th vector element from the value vector, stores the absolute value of the result into a result vector and returns the result vector. This method operates on signed integer values.
private Vector64<short> AbsSaturateScalarTest(Vector64<short> value)
{
  return AdvSimd.Arm64.AbsSaturateScalar(value);
}
// value = <11, 12, 13, 14>
// Result = <11, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> AbsSaturateScalar(Vector64<int> value)
Vector64<long> AbsSaturateScalar(Vector64<long> value)
Vector64<sbyte> AbsSaturateScalar(Vector64<sbyte> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqabs   h16, h0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
19. AbsScalar
Vector64<double> AbsScalar(Vector64<double> value)
This method calculates floating-point absolute value, similar to Abs() and stores them in result vector and return the result vector.
private Vector64<double> AbsScalarTest(Vector64<double> value)
{
  return AdvSimd.AbsScalar(value);
}
// value = <-11.5>
// Result = <11.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> AbsScalar(Vector64<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<ulong> AbsScalar(Vector64<long> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fabs    d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
20. Add
Vector64<byte> Add(Vector64<byte> left, Vector64<byte> right)
This method adds the corresponding vector elements in the left and right vector, and returns the result vector.
private Vector64<byte> AddTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.Add(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <32, 34, 36, 38, 40, 42, 44, 46>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Add(Vector64<short> left, Vector64<short> right)
Vector64<int> Add(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> Add(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Add(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Add(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Add(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> Add(Vector128<byte> left, Vector128<byte> right)
Vector128<short> Add(Vector128<short> left, Vector128<short> right)
Vector128<int> Add(Vector128<int> left, Vector128<int> right)
Vector128<long> Add(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> Add(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Add(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Add(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Add(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> Add(Vector128<ulong> left, Vector128<ulong> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Add(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            add     v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
21. AddAcross
Vector64<byte> AddAcross(Vector64<byte> value)
This method adds every vector element in the value vector together, and writes the result to the 0th element of result vector, while other elements of result vector set to 0.
private Vector64<byte> AddAcrossTest(Vector64<byte> value)
{
  return AdvSimd.Arm64.AddAcross(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <116, 0, 0, 0, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> AddAcross(Vector64<short> value)
Vector64<sbyte> AddAcross(Vector64<sbyte> value)
Vector64<ushort> AddAcross(Vector64<ushort> value)
Vector64<byte> AddAcross(Vector128<byte> value)
Vector64<short> AddAcross(Vector128<short> value)
Vector64<int> AddAcross(Vector128<int> value)
Vector64<sbyte> AddAcross(Vector128<sbyte> value)
Vector64<ushort> AddAcross(Vector128<ushort> value)
Vector64<uint> AddAcross(Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddAcrossTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            addv    b16, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
22. AddAcrossWidening
Vector64<ushort> AddAcrossWidening(Vector64<byte> value)
This method adds every vector element in the value vector together,  and writes the result to the 0th element of result vector, while other elements of result vector set to 0.  As seen in below example, the result vector’s element size ushort is twice as long as the input parameter’s element size byte.
private Vector64<ushort> AddAcrossWideningTest(Vector64<byte> value)
{
  return AdvSimd.Arm64.AddAcrossWidening(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <116, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> AddAcrossWidening(Vector64<short> value)
Vector64<short> AddAcrossWidening(Vector64<sbyte> value)
Vector64<uint> AddAcrossWidening(Vector64<ushort> value)
Vector64<ushort> AddAcrossWidening(Vector128<byte> value)
Vector64<int> AddAcrossWidening(Vector128<short> value)
Vector64<long> AddAcrossWidening(Vector128<int> value)
Vector64<short> AddAcrossWidening(Vector128<sbyte> value)
Vector64<uint> AddAcrossWidening(Vector128<ushort> value)
Vector64<ulong> AddAcrossWidening(Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddAcrossWideningTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uaddlv  h16, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
23. AddHighNarrowingLower
Vector64<byte> AddHighNarrowingLower(Vector128<ushort> left, Vector128<ushort> right)
This method adds corresponding vector elements in the left and right vector, places the most significant half of the result into the result vector and return the result vector. As seen in below example, elements in result vector Vector64<byte> is half the size of that of input Vector128<ushort> although number of total elements are same.
private Vector64<byte> AddHighNarrowingLowerTest(Vector128<ushort> left, Vector128<ushort> right)
{
  return AdvSimd.AddHighNarrowingLower(left, right);
}
// left = <100, 200, 300, 400, 500, 600, 700, 800>
// right = <900, 1000, 1100, 1200, 1300, 1400, 1500, 1600>
// Result = <3, 4, 5, 6, 7, 7, 8, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AddHighNarrowingLower(Vector128<int> left, Vector128<int> right)
Vector64<int> AddHighNarrowingLower(Vector128<long> left, Vector128<long> right)
Vector64<sbyte> AddHighNarrowingLower(Vector128<short> left, Vector128<short> right)
Vector64<ushort> AddHighNarrowingLower(Vector128<uint> left, Vector128<uint> right)
Vector64<uint> AddHighNarrowingLower(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddHighNarrowingLowerTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            addhn   v16.8b, v0.8h, v1.8h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
24. AddHighNarrowingUpper
Vector128<byte> AddHighNarrowingUpper(Vector64<byte> lower, Vector128<ushort> left, Vector128<ushort> right)
This method adds corresponding vector elements in the left and right vector, places the most significant half of the result into upper half of the result vector while the lower half of vector is set to the elements in lower.
private Vector128<byte> AddHighNarrowingUpperTest(Vector64<byte> lower, Vector128<ushort> left, Vector128<ushort> right)
{
  return AdvSimd.AddHighNarrowingUpper(lower, left, right);
}
// lower = <1, 255, 13, 41, 54, 61, 71, 18>
// left = <100, 200, 300, 400, 500, 600, 700, 800>
// right = <900, 1000, 1100, 1200, 1300, 1400, 1500, 1600>
// Result = <1, 255, 13, 41, 54, 61, 71, 18, 3, 4, 5, 6, 7, 7, 8, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> AddHighNarrowingUpper(Vector64<short> lower, Vector128<int> left, Vector128<int> right)
Vector128<int> AddHighNarrowingUpper(Vector64<int> lower, Vector128<long> left, Vector128<long> right)
Vector128<sbyte> AddHighNarrowingUpper(Vector64<sbyte> lower, Vector128<short> left, Vector128<short> right)
Vector128<ushort> AddHighNarrowingUpper(Vector64<ushort> lower, Vector128<uint> left, Vector128<uint> right)
Vector128<uint> AddHighNarrowingUpper(Vector64<uint> lower, Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddHighNarrowingUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;  V02 arg2         [V02,T02] (  3,  3   )  simd16  ->   d2         HFA(simd16) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            addhn2  v0.16b, v1.8h, v2.8h
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 20, prolog size 8
25. AddPairwise
Vector64<byte> AddPairwise(Vector64<byte> left, Vector64<byte> right)
This method  creates a vector by concatenating the vector elements of left vector followed by those of the right vector, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values and places them in result vector and returns the result vector.
private Vector64<byte> AddPairwiseTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.AddPairwise(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <23, 27, 31, 35, 43, 47, 51, 55>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AddPairwise(Vector64<short> left, Vector64<short> right)
Vector64<int> AddPairwise(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> AddPairwise(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> AddPairwise(Vector64<float> left, Vector64<float> right)
Vector64<ushort> AddPairwise(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> AddPairwise(Vector64<uint> left, Vector64<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<byte> AddPairwise(Vector128<byte> left, Vector128<byte> right)
Vector128<double> AddPairwise(Vector128<double> left, Vector128<double> right)
Vector128<short> AddPairwise(Vector128<short> left, Vector128<short> right)
Vector128<int> AddPairwise(Vector128<int> left, Vector128<int> right)
Vector128<long> AddPairwise(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> AddPairwise(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> AddPairwise(Vector128<float> left, Vector128<float> right)
Vector128<ushort> AddPairwise(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AddPairwise(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> AddPairwise(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            addp    v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
26. AddPairwiseScalar
Vector64<float> AddPairwiseScalar(Vector64<float> value)
This method adds  vector elements in the value vector and  writes the result to the 0th element of result vector, while other elements of result vector set to 0.
private Vector64<float> AddPairwiseScalarTest(Vector64<float> value)
{
  return AdvSimd.Arm64.AddPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <24, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> AddPairwiseScalar(Vector128<double> value)
Vector64<long> AddPairwiseScalar(Vector128<long> value)
Vector64<ulong> AddPairwiseScalar(Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            faddp   s16, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
27. AddPairwiseWidening
Vector64<ushort> AddPairwiseWidening(Vector64<byte> value)
This method adds pairs of adjacent integer values from the value vector, stores them in a result vector and returns the vector. As seen in example below, the result vector elements ushort is twice as long as the input’s vector elements size byte.
private Vector64<ushort> AddPairwiseWideningTest(Vector64<byte> value)
{
  return AdvSimd.AddPairwiseWidening(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <23, 27, 31, 35>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> AddPairwiseWidening(Vector64<short> value)
Vector64<short> AddPairwiseWidening(Vector64<sbyte> value)
Vector64<uint> AddPairwiseWidening(Vector64<ushort> value)
Vector128<ushort> AddPairwiseWidening(Vector128<byte> value)
Vector128<int> AddPairwiseWidening(Vector128<short> value)
Vector128<long> AddPairwiseWidening(Vector128<int> value)
Vector128<short> AddPairwiseWidening(Vector128<sbyte> value)
Vector128<uint> AddPairwiseWidening(Vector128<ushort> value)
Vector128<ulong> AddPairwiseWidening(Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseWideningTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uaddlp  v16.4h, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
28. AddPairwiseWideningAndAdd
Vector64<ushort> AddPairwiseWideningAndAdd(Vector64<ushort> addend, Vector64<byte> value)
This method adds pairs of adjacent integer values from the value vector and accumulates the results with those of addend vector and return the result vector. As seen in below example, the result vector element size ushort is twice as long as that of input parameter’s size byte.
private Vector64<ushort> AddPairwiseWideningAndAddTest(Vector64<ushort> addend, Vector64<byte> value)
{
  return AdvSimd.AddPairwiseWideningAndAdd(addend, value);
}
// addend = <11, 12, 13, 14>
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <34, 39, 44, 49>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> AddPairwiseWideningAndAdd(Vector64<int> addend, Vector64<short> value)
Vector64<short> AddPairwiseWideningAndAdd(Vector64<short> addend, Vector64<sbyte> value)
Vector64<uint> AddPairwiseWideningAndAdd(Vector64<uint> addend, Vector64<ushort> value)
Vector128<ushort> AddPairwiseWideningAndAdd(Vector128<ushort> addend, Vector128<byte> value)
Vector128<int> AddPairwiseWideningAndAdd(Vector128<int> addend, Vector128<short> value)
Vector128<long> AddPairwiseWideningAndAdd(Vector128<long> addend, Vector128<int> value)
Vector128<short> AddPairwiseWideningAndAdd(Vector128<short> addend, Vector128<sbyte> value)
Vector128<uint> AddPairwiseWideningAndAdd(Vector128<uint> addend, Vector128<ushort> value)
Vector128<ulong> AddPairwiseWideningAndAdd(Vector128<ulong> addend, Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseWideningAndAddTest(System.Runtime.Intrinsics.Vector64`1[UInt16],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uadalp  v0.4h, v1.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 20, prolog size 8
29. AddPairwiseWideningAndAddScalar
Vector64<long> AddPairwiseWideningAndAddScalar(Vector64<long> addend, Vector64<int> value)
This method adds pairs of adjacent integer values from value vector and accumulates the results with the vector elements of addend and returns the result vector. As seen in below example, the result vector element’s size long is twice as long as that of value vector element’s size int.
private Vector64<long> AddPairwiseWideningAndAddScalarTest(Vector64<long> addend, Vector64<int> value)
{
  return AdvSimd.AddPairwiseWideningAndAddScalar(addend, value);
}
// addend = <11>
// value = <11, 12>
// Result = <34>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> AddPairwiseWideningAndAddScalar(Vector64<ulong> addend, Vector64<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseWideningAndAddScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sadalp  v0.1d, v1.2s
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 20, prolog size 8
30. AddPairwiseWideningScalar
Vector64<long> AddPairwiseWideningScalar(Vector64<int> value)
This method adds pairs of adjacent integer values from the value vector, stores them in a result vector and returns the result vector. As seen in below example, the result vector element’s size long is twice as long as the input value’s element size int.
private Vector64<long> AddPairwiseWideningScalarTest(Vector64<int> value)
{
  return AdvSimd.AddPairwiseWideningScalar(value);
}
// value = <11, 12>
// Result = <23>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> AddPairwiseWideningScalar(Vector64<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseWideningScalarTest(System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            saddlp  v16.1d, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
31. AddRoundedHighNarrowingLower
Vector64<byte> AddRoundedHighNarrowingLower(Vector128<ushort> left, Vector128<ushort> right)
This method adds corresponding vector elements in left vector to those of right vector, stores the most significant half of the result into the result vector such that the result is rounded and return the result.
private Vector64<byte> AddRoundedHighNarrowingLowerTest(Vector128<ushort> left, Vector128<ushort> right)
{
  return AdvSimd.AddRoundedHighNarrowingLower(left, right);
}
// left = <100, 200, 300, 400, 500, 600, 700, 800>
// right = <900, 1000, 1100, 1200, 1300, 1400, 1500, 1600>
// Result = <4, 5, 5, 6, 7, 8, 9, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AddRoundedHighNarrowingLower(Vector128<int> left, Vector128<int> right)
Vector64<int> AddRoundedHighNarrowingLower(Vector128<long> left, Vector128<long> right)
Vector64<sbyte> AddRoundedHighNarrowingLower(Vector128<short> left, Vector128<short> right)
Vector64<ushort> AddRoundedHighNarrowingLower(Vector128<uint> left, Vector128<uint> right)
Vector64<uint> AddRoundedHighNarrowingLower(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddRoundedHighNarrowingLowerTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            raddhn  v16.8b, v0.8h, v1.8h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
32. AddRoundedHighNarrowingUpper
Vector128<byte> AddRoundedHighNarrowingUpper(Vector64<byte> lower, Vector128<ushort> left, Vector128<ushort> right)
This method adds corresponding vector elements in left vector to those of right vector, places the most significant half of the result (after rounding) into the upper half of the result vector while the lower half is set to the elements in lower.
private Vector128<byte> AddRoundedHighNarrowingUpperTest(Vector64<byte> lower, Vector128<ushort> left, Vector128<ushort> right)
{
  return AdvSimd.AddRoundedHighNarrowingUpper(lower, left, right);
}
// lower = <1, 255, 13, 41, 54, 61, 71, 18>
// left = <100, 200, 300, 400, 500, 600, 700, 800>
// right = <900, 1000, 1100, 1200, 1300, 1400, 1500, 1600>
// Result = <1, 255, 13, 41, 54, 61, 71, 18, 4, 5, 5, 6, 7, 8, 9, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> AddRoundedHighNarrowingUpper(Vector64<short> lower, Vector128<int> left, Vector128<int> right)
Vector128<int> AddRoundedHighNarrowingUpper(Vector64<int> lower, Vector128<long> left, Vector128<long> right)
Vector128<sbyte> AddRoundedHighNarrowingUpper(Vector64<sbyte> lower, Vector128<short> left, Vector128<short> right)
Vector128<ushort> AddRoundedHighNarrowingUpper(Vector64<ushort> lower, Vector128<uint> left, Vector128<uint> right)
Vector128<uint> AddRoundedHighNarrowingUpper(Vector64<uint> lower, Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddRoundedHighNarrowingUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;  V02 arg2         [V02,T02] (  3,  3   )  simd16  ->   d2         HFA(simd16) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            raddhn2 v0.16b, v1.8h, v2.8h
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 20, prolog size 8
33. AddSaturate
Vector64<byte> AddSaturate(Vector64<byte> left, Vector64<byte> right)
This method adds the values of corresponding elements of the left and right vectors, stores the results in a vector and returns the result vector. If overflow occurs with any of the results, those results are saturated.
private Vector64<byte> AddSaturateTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.AddSaturate(left, right);
}
// left = <155, 200, 200, 1, 5, 16, 17, 18>
// right = <155, 100, 100, 2, 25, 26, 27, 28>
// Result = <255, 255, 255, 3, 30, 42, 44, 46>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AddSaturate(Vector64<short> left, Vector64<short> right)
Vector64<int> AddSaturate(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> AddSaturate(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> AddSaturate(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> AddSaturate(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> AddSaturate(Vector128<byte> left, Vector128<byte> right)
Vector128<short> AddSaturate(Vector128<short> left, Vector128<short> right)
Vector128<int> AddSaturate(Vector128<int> left, Vector128<int> right)
Vector128<long> AddSaturate(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> AddSaturate(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> AddSaturate(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AddSaturate(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> AddSaturate(Vector128<ulong> left, Vector128<ulong> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> AddSaturate(Vector64<byte> left, Vector64<sbyte> right)
Vector64<short> AddSaturate(Vector64<short> left, Vector64<ushort> right)
Vector64<int> AddSaturate(Vector64<int> left, Vector64<uint> right)
Vector64<sbyte> AddSaturate(Vector64<sbyte> left, Vector64<byte> right)
Vector64<ushort> AddSaturate(Vector64<ushort> left, Vector64<short> right)
Vector64<uint> AddSaturate(Vector64<uint> left, Vector64<int> right)
Vector128<byte> AddSaturate(Vector128<byte> left, Vector128<sbyte> right)
Vector128<short> AddSaturate(Vector128<short> left, Vector128<ushort> right)
Vector128<int> AddSaturate(Vector128<int> left, Vector128<uint> right)
Vector128<long> AddSaturate(Vector128<long> left, Vector128<ulong> right)
Vector128<sbyte> AddSaturate(Vector128<sbyte> left, Vector128<byte> right)
Vector128<ushort> AddSaturate(Vector128<ushort> left, Vector128<short> right)
Vector128<uint> AddSaturate(Vector128<uint> left, Vector128<int> right)
Vector128<ulong> AddSaturate(Vector128<ulong> left, Vector128<long> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddSaturateTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uqadd   v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
34. AddSaturateScalar
Vector64<long> AddSaturateScalar(Vector64<long> left, Vector64<long> right)
This method scalar variant, adds the values of corresponding elements of the left and right vectors, stores the results in a vector and returns the result vector. If overflow occurs with any of the results, those results are saturated.
private Vector64<long> AddSaturateScalarTest(Vector64<long> left, Vector64<long> right)
{
  return AdvSimd.AddSaturateScalar(left, right);
}
// left = <11>
// right = <11>
// Result = <22>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> AddSaturateScalar(Vector64<ulong> left, Vector64<ulong> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> AddSaturateScalar(Vector64<byte> left, Vector64<byte> right)
Vector64<byte> AddSaturateScalar(Vector64<byte> left, Vector64<sbyte> right)
Vector64<short> AddSaturateScalar(Vector64<short> left, Vector64<short> right)
Vector64<short> AddSaturateScalar(Vector64<short> left, Vector64<ushort> right)
Vector64<int> AddSaturateScalar(Vector64<int> left, Vector64<int> right)
Vector64<int> AddSaturateScalar(Vector64<int> left, Vector64<uint> right)
Vector64<long> AddSaturateScalar(Vector64<long> left, Vector64<ulong> right)
Vector64<sbyte> AddSaturateScalar(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<sbyte> AddSaturateScalar(Vector64<sbyte> left, Vector64<byte> right)
Vector64<ushort> AddSaturateScalar(Vector64<ushort> left, Vector64<ushort> right)
Vector64<ushort> AddSaturateScalar(Vector64<ushort> left, Vector64<short> right)
Vector64<uint> AddSaturateScalar(Vector64<uint> left, Vector64<uint> right)
Vector64<uint> AddSaturateScalar(Vector64<uint> left, Vector64<int> right)
Vector64<ulong> AddSaturateScalar(Vector64<ulong> left, Vector64<long> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqadd   d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
35. AddScalar
Vector64<double> AddScalar(Vector64<double> left, Vector64<double> right)
This method adds the floating-point values of the two source vectors, and writes the result to the result. This performs scalar operation.
private Vector64<double> AddScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.AddScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <23>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<long> AddScalar(Vector64<long> left, Vector64<long> right)
Vector64<float> AddScalar(Vector64<float> left, Vector64<float> right)
Vector64<ulong> AddScalar(Vector64<ulong> left, Vector64<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fadd    d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
36. AddWideningLower
Vector128<ushort> AddWideningLower(Vector64<byte> left, Vector64<byte> right)
This method adds corresponding vector elements in the left to those of right vector, stores the result in a vector, and returns the result vector. As seen in below example, the result vector element’s size ushort is twice that of input parameter element size byte.
private Vector128<ushort> AddWideningLowerTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.AddWideningLower(left, right);
}
// left = <155, 200, 200, 1, 5, 16, 17, 18>
// right = <155, 100, 100, 2, 25, 26, 27, 28>
// Result = <310, 300, 300, 3, 30, 42, 44, 46>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> AddWideningLower(Vector64<short> left, Vector64<short> right)
Vector128<long> AddWideningLower(Vector64<int> left, Vector64<int> right)
Vector128<short> AddWideningLower(Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> AddWideningLower(Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> AddWideningLower(Vector64<uint> left, Vector64<uint> right)
Vector128<short> AddWideningLower(Vector128<short> left, Vector64<sbyte> right)
Vector128<int> AddWideningLower(Vector128<int> left, Vector64<short> right)
Vector128<long> AddWideningLower(Vector128<long> left, Vector64<int> right)
Vector128<ushort> AddWideningLower(Vector128<ushort> left, Vector64<byte> right)
Vector128<uint> AddWideningLower(Vector128<uint> left, Vector64<ushort> right)
Vector128<ulong> AddWideningLower(Vector128<ulong> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uaddl   v16.8h, v0.8b, v1.8b
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
37. AddWideningUpper
Vector128<ushort> AddWideningUpper(Vector128<byte> left, Vector128<byte> right)
This method adds corresponding vector elements in the upper half of left to those of right vector, stores the result into a result vector, and returns the result vector. As seen in below example, the result vector element’s size ushort is twice as long as the input parameter’s element size byte.
private Vector128<ushort> AddWideningUpperTest(Vector128<byte> left, Vector128<byte> right)
{
  return AdvSimd.AddWideningUpper(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <48, 50, 52, 54, 56, 58, 60, 62>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> AddWideningUpper(Vector128<short> left, Vector128<short> right)
Vector128<short> AddWideningUpper(Vector128<short> left, Vector128<sbyte> right)
Vector128<int> AddWideningUpper(Vector128<int> left, Vector128<short> right)
Vector128<long> AddWideningUpper(Vector128<int> left, Vector128<int> right)
Vector128<long> AddWideningUpper(Vector128<long> left, Vector128<int> right)
Vector128<short> AddWideningUpper(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> AddWideningUpper(Vector128<ushort> left, Vector128<byte> right)
Vector128<uint> AddWideningUpper(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AddWideningUpper(Vector128<uint> left, Vector128<ushort> right)
Vector128<ulong> AddWideningUpper(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> AddWideningUpper(Vector128<ulong> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uaddl2  v16.8h, v0.16b, v1.16b
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
38. And
Vector64<byte> And(Vector64<byte> left, Vector64<byte> right)
This method ands the vector elements in the leftand right vector, and returns the result vector.
private Vector64<byte> AndTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.And(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <1, 4, 5, 8, 9, 16, 17, 16>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> And(Vector64<double> left, Vector64<double> right)
Vector64<short> And(Vector64<short> left, Vector64<short> right)
Vector64<int> And(Vector64<int> left, Vector64<int> right)
Vector64<long> And(Vector64<long> left, Vector64<long> right)
Vector64<sbyte> And(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> And(Vector64<float> left, Vector64<float> right)
Vector64<ushort> And(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> And(Vector64<uint> left, Vector64<uint> right)
Vector64<ulong> And(Vector64<ulong> left, Vector64<ulong> right)
Vector128<byte> And(Vector128<byte> left, Vector128<byte> right)
Vector128<double> And(Vector128<double> left, Vector128<double> right)
Vector128<short> And(Vector128<short> left, Vector128<short> right)
Vector128<int> And(Vector128<int> left, Vector128<int> right)
Vector128<long> And(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> And(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> And(Vector128<float> left, Vector128<float> right)
Vector128<ushort> And(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> And(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> And(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AndTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            and     v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
39. BitwiseClear
Vector64<byte> BitwiseClear(Vector64<byte> value, Vector64<byte> mask)
This method performs AND of corresponding vector elements in value and complement of mask vector and returns the result vector containing the result of this operation.
private Vector64<byte> BitwiseClearTest(Vector64<byte> value, Vector64<byte> mask)
{
  return AdvSimd.BitwiseClear(value, mask);
}
// value = <255, 255, 255, 255, 255, 255, 255, 255>
// mask = <1, 2, 4, 8, 16, 32, 64, 128>
// Result = <254, 253, 251, 247, 239, 223, 191, 127>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> BitwiseClear(Vector64<double> value, Vector64<double> mask)
Vector64<short> BitwiseClear(Vector64<short> value, Vector64<short> mask)
Vector64<int> BitwiseClear(Vector64<int> value, Vector64<int> mask)
Vector64<long> BitwiseClear(Vector64<long> value, Vector64<long> mask)
Vector64<sbyte> BitwiseClear(Vector64<sbyte> value, Vector64<sbyte> mask)
Vector64<float> BitwiseClear(Vector64<float> value, Vector64<float> mask)
Vector64<ushort> BitwiseClear(Vector64<ushort> value, Vector64<ushort> mask)
Vector64<uint> BitwiseClear(Vector64<uint> value, Vector64<uint> mask)
Vector64<ulong> BitwiseClear(Vector64<ulong> value, Vector64<ulong> mask)
Vector128<byte> BitwiseClear(Vector128<byte> value, Vector128<byte> mask)
Vector128<double> BitwiseClear(Vector128<double> value, Vector128<double> mask)
Vector128<short> BitwiseClear(Vector128<short> value, Vector128<short> mask)
Vector128<int> BitwiseClear(Vector128<int> value, Vector128<int> mask)
Vector128<long> BitwiseClear(Vector128<long> value, Vector128<long> mask)
Vector128<sbyte> BitwiseClear(Vector128<sbyte> value, Vector128<sbyte> mask)
Vector128<float> BitwiseClear(Vector128<float> value, Vector128<float> mask)
Vector128<ushort> BitwiseClear(Vector128<ushort> value, Vector128<ushort> mask)
Vector128<uint> BitwiseClear(Vector128<uint> value, Vector128<uint> mask)
Vector128<ulong> BitwiseClear(Vector128<ulong> value, Vector128<ulong> mask)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:BitwiseClearTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            bic     v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
40. BitwiseSelect
Vector64<byte> BitwiseSelect(Vector64<byte> select, Vector64<byte> left, Vector64<byte> right)
This method sets each bit in the result to the corresponding bit from the left vector when the select vector’s bit was 1, otherwise from the right vector.
private Vector64<byte> BitwiseSelectTest(Vector64<byte> select, Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.BitwiseSelect(select, left, right);
}
// select = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <21, 22, 23, 24, 25, 26, 27, 28>
// right = <31, 32, 33, 34, 35, 36, 37, 38>
// Result = <21, 36, 37, 40, 41, 52, 53, 52>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> BitwiseSelect(Vector64<double> select, Vector64<double> left, Vector64<double> right)
Vector64<short> BitwiseSelect(Vector64<short> select, Vector64<short> left, Vector64<short> right)
Vector64<int> BitwiseSelect(Vector64<int> select, Vector64<int> left, Vector64<int> right)
Vector64<long> BitwiseSelect(Vector64<long> select, Vector64<long> left, Vector64<long> right)
Vector64<sbyte> BitwiseSelect(Vector64<sbyte> select, Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> BitwiseSelect(Vector64<float> select, Vector64<float> left, Vector64<float> right)
Vector64<ushort> BitwiseSelect(Vector64<ushort> select, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> BitwiseSelect(Vector64<uint> select, Vector64<uint> left, Vector64<uint> right)
Vector64<ulong> BitwiseSelect(Vector64<ulong> select, Vector64<ulong> left, Vector64<ulong> right)
Vector128<byte> BitwiseSelect(Vector128<byte> select, Vector128<byte> left, Vector128<byte> right)
Vector128<double> BitwiseSelect(Vector128<double> select, Vector128<double> left, Vector128<double> right)
Vector128<short> BitwiseSelect(Vector128<short> select, Vector128<short> left, Vector128<short> right)
Vector128<int> BitwiseSelect(Vector128<int> select, Vector128<int> left, Vector128<int> right)
Vector128<long> BitwiseSelect(Vector128<long> select, Vector128<long> left, Vector128<long> right)
Vector128<sbyte> BitwiseSelect(Vector128<sbyte> select, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> BitwiseSelect(Vector128<float> select, Vector128<float> left, Vector128<float> right)
Vector128<ushort> BitwiseSelect(Vector128<ushort> select, Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> BitwiseSelect(Vector128<uint> select, Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> BitwiseSelect(Vector128<ulong> select, Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:BitwiseSelectTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mov     v16.8b, v0.8b
            bsl     v16.8b, v1.8b, v2.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 28, prolog size 8
41. Ceiling
Vector64<float> Ceiling(Vector64<float> value)
This method rounds each vector element of value having floating-point values to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<float> CeilingTest(Vector64<float> value)
{
  return AdvSimd.Ceiling(value);
}
// value = <11.5, 12.5>
// Result = <12, 13>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> Ceiling(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Ceiling(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CeilingTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintp  v16.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
42. CeilingScalar
Vector64<double> CeilingScalar(Vector64<double> value)
Same as Ceiling above but operates at scalar level.
private Vector64<double> CeilingScalarTest(Vector64<double> value)
{
  return AdvSimd.CeilingScalar(value);
}
// value = <11.5>
// Result = <12>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> CeilingScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CeilingScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintp  d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
43. CompareEqual
Vector64<byte> CompareEqual(Vector64<byte> left, Vector64<byte> right)
This method compares corresponding vector elements from left with those in right, and if the comparison is equal sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to  zero and returns the result vector.
private Vector64<byte> CompareEqualTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.CompareEqual(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 22,13, 14, 25, 26, 27, 28>
// Result = <255, 0, 255, 255, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> CompareEqual(Vector64<short> left, Vector64<short> right)
Vector64<int> CompareEqual(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> CompareEqual(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> CompareEqual(Vector64<float> left, Vector64<float> right)
Vector64<ushort> CompareEqual(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> CompareEqual(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> CompareEqual(Vector128<byte> left, Vector128<byte> right)
Vector128<short> CompareEqual(Vector128<short> left, Vector128<short> right)
Vector128<int> CompareEqual(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> CompareEqual(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> CompareEqual(Vector128<float> left, Vector128<float> right)
Vector128<ushort> CompareEqual(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> CompareEqual(Vector128<uint> left, Vector128<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> CompareEqual(Vector128<double> left, Vector128<double> right)
Vector128<long> CompareEqual(Vector128<long> left, Vector128<long> right)
Vector128<ulong> CompareEqual(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CompareEqualTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            cmeq    v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
44. CompareEqualScalar
Vector64<double> CompareEqualScalar(Vector64<double> left, Vector64<double> right)
This method compares corresponding floating-point values from the left and right vector, and if the comparison is equal sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> CompareEqualScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.CompareEqualScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<long> CompareEqualScalar(Vector64<long> left, Vector64<long> right)
Vector64<float> CompareEqualScalar(Vector64<float> left, Vector64<float> right)
Vector64<ulong> CompareEqualScalar(Vector64<ulong> left, Vector64<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CompareEqualScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fcmeq   d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8
45. CompareGreaterThan
Vector64<byte> CompareGreaterThan(Vector64<byte> left, Vector64<byte> right)
This method compares corresponding vector elements in the left and right vector, and if the left’s  value is greater than the right’s value sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<byte> CompareGreaterThanTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.CompareGreaterThan(left, right);
}
// left = <31, 12, 33, 34, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <255, 0, 255, 255, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> CompareGreaterThan(Vector64<short> left, Vector64<short> right)
Vector64<int> CompareGreaterThan(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> CompareGreaterThan(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> CompareGreaterThan(Vector64<float> left, Vector64<float> right)
Vector64<ushort> CompareGreaterThan(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> CompareGreaterThan(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> CompareGreaterThan(Vector128<byte> left, Vector128<byte> right)
Vector128<short> CompareGreaterThan(Vector128<short> left, Vector128<short> right)
Vector128<int> CompareGreaterThan(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> CompareGreaterThan(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> CompareGreaterThan(Vector128<float> left, Vector128<float> right)
Vector128<ushort> CompareGreaterThan(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> CompareGreaterThan(Vector128<uint> left, Vector128<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> CompareGreaterThan(Vector128<double> left, Vector128<double> right)
Vector128<long> CompareGreaterThan(Vector128<long> left, Vector128<long> right)
Vector128<ulong> CompareGreaterThan(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CompareGreaterThanTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            cmhi    v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr
; Total bytes of code 24, prolog size 8